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ABSTRACT 


Data contained in a histogram can be compressed by 
calculating and transmitting two quantities: The area of the 
histogram and the sum of the squares of each bar of the his- 
togram. The paper presents a survey of what knowledge one 
has about the original histogram when given only these two 
quantities. A set of theorems is derived which indicates 
the magnitude limits of the individual bars of the histogram 
as a function of these two quantities. The technique results 
in a data compression factor of greater than 10 for certain 
scientific experiments where the only information required 
is the amplitude distribution of the individual histogram 
bars. 
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A STATISTICAL DATA COMPRESSION TECHNIQUE 


by 

James W. Snively, Jr. 
Goddard Space Flight Center 


INTRODUCTION 

Every person involved in spacecraft telemetry soon realizes that the amount of data desired or 
collected often exceeds the amount that can be transmitted. One method used to measure a phenome- 
non is to collect the data in the form of a histogram. This histogram can take a form such as parti- 
cle detector counts versus the angular position of the sensor. It takes many bits to send back a 
complete histogram involving large amounts of counts — often more bits than are available. One 
way of solving this problem is to have equipment in the satellite to process incoming data and trans- 
mit only the results of its calculations. This paper presents the mathematical basis for a novel type 
of data compression that involves the transmission of only the area of a histogram and certain digits 
of the sum of the squares of the histogram bars. The amount of information about the histogram that 
can be recovered from these two quantities is surprising. For example. Figure 1 shows a pair of 
typical histograms expected in a plasma-measuring experiment. Knowledge of the total counts over 
all azimuthal angles combined with knowledge of the sum of the squares of each count in 22.5° 
intervals will readily distinguish interplanetary space curves from transition region curves and, 
in fact, allows one to make much finer distinctions. 


TECHNIQUE 

Relationship between Area and Sum 
of Squares for a Histogram 

Assume that the area, A, of a histogram 
is known. The first theorem of the paper (Ap- 
pendix) states that the sum of the squares of the 
histogram bars, 2, must lie between two num- 
bers which differ by a factor equal to the num- 
ber of bars, n, in the histogram. The larger 
of these two limits for the sum of the squares 
corresponds to a histogram where all but one 
of the bars contain no counts. Therefore, this 
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Figure 1 — Typical histograms expected in a plasma- 
measuring experiment. 
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larger limit for the sum of the squares is simply the square of the area, A 2 . The smaller limit for 
the sum of the squares corresponds to a histogram where all of the bars have equal numbers of 
counts. In this case, the lower limit turns out to be A 2 /n. 

These facts illustrated in Figure 2 show the relationship between the area and the sum of the 
squares for histograms of sixteen bars. Sixteen bar histograms (with area less than 2 19 counts) 
lie within the cross-hatched region between the two parallel lines. For example, a sixteen bar 
histogram with an area of 2 6 must have its sum of squares between 2 12 and 2 8 . This absolute 
constraint is taken advantage of in the telemetry technique we wish to describe. 


Hardware Use of this Relationship 

If the quantity A is known, then according 
to the preceding section, the sum of squares, 
2, is known to within a factor of n. In other 
words, the location of the most significant bit 
of 1 is known to within log 2 n bit positions. Spe- 
cifically for a sixteen-bar histogram the most 
significant bit must lie within one of four posi- 
tions. If, for this sixteen bar histogram, A is 
2 6 counts, then the most significant bit of 1 is 
either the 9th , 10th, 11th and 12th bit of the 
word. The telemetry technique being expounded 
is simply the transmission of A and log 2 n bits 
of 2. More bits of 2 can be transmitted if still 
more accuracy is desired. 

This study was conducted for use with a 
spacecraft bourne plasma measuring experi- 
ment (Reference 1) that will be flown on future 
Interplanetary Monitoring Platform (IMP) sat- 
ellites. The histogram collected in this experi- 
ment contains sixteen bars, each containing the 
number of counts produced by a plasma detector 
during one-sixteenth of a spin of the satellite. 
Figure 3 shows a block diagram of the equip- 
ment designed to implement this telemetry 
technique in this particular application. 

The area counter at the bottom of the dia- 
gram commutates four bits of the sum of 
squares counter to the telemetry system. Note 
that in this application a logarithmic counter 
(Reference 2) is used for the area determina- 
tion. Therefore, although the total area of the 



Figure 2 — Relationship between area and sum of 
squares for a histogram with 16 bars. 
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collected histogram can be as large as 2 19 
counts, only eight bits are required to repre- 
sent this number to a ±3 percent accuracy. 

Transmission of the twelve indicated bits 
allows one to determine the sum of the squares 
to at worst ±33 percent. This error occurs 
when the four commutated bits of the sum of 
squares counter are 0001. When these four bits 
are 1111 the worst error is ±3.3 percent. 

The remainder of this paper is theoretical. 
It assumes that the area and sum of squares are 
known exactly. Consequences of such conditions 
will be discussed. The consequences obtained 
may be applied to cases where the area and 
sum of squares are not known exactly. 
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Figure 3 — Block diagram of processing equipment. 


INFORMATION CONTENT OF THE AREA AND THE SUM OF SQUARES 
Definition of Ratio, r 

A quantity which we shall call the "ratio" denoted by the symbol, r , is defined as 


n 




(i) 


Here n denotes the number of bars in the histogram and c. denotes the number of counts in the i th 
largest bar. This parameter is related to the mean, and the variance, o- 2 , by the equation, 



( 2 ) 


but has the advantage of only assuming values between 1 and n . For this reason, it is a useful 
parameter for describing various properties of histograms. 


Peak Height 

The value of the largest bar of a histogram, expressed as a fraction of the area of the histogram, 
cannot be greater than the area, A, nor smaller than an n th of the area, A/n, where n is the number 
of bars in the histogram. If the largest bar has the value A then all other bars must have the value 
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zero. If this largest bar has the value A/n all others must have this same value. These two cases 
correspond to ratios of n and of 1 respectively. These are the extreme cases. 

For any given ratio the largest bar of a histogram can only assume values in a sub -interval of 
the interval between A/n and A. For a ratio of r the greatest possible value for the largest histogram 
bar is given by Theorem 2 in the appendix 



This expression reduces to A when r = n and to A/n when r = 1 . Therefore, the expression for the 
largest histogram bar agrees with the known histograms for the extreme values of r. 

If the largest bar of a histogram with ratio r is equal to the value given by Expression 3, that 
histogram in this special case is completely determined. In fact, the theorem states that all of the 
remaining n - 1 bars are equal. Therefore, these bars must be equal to the value of the expression, 



which is merely the area not included in the large bar divided by n - 1 . Figure 4 shows several 
five bar histograms. Each histogram has the greatest largest bar possible for the indicated r. 

Note that the largest bar decreases as r decreases 



Figure 4 — Histograms (n = 5) with the largest peak 
height for the indicated ratios. 
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RATIO, r 

Figure 5 — A segment of the real number line illustrat- 
ing the definition of s-intervals. 


and that the base of the histogram increases as 
r decreases. 

The ratio r of any histogram must lie in the 
interval between 1 and n. Divide this interval 
into n - 1 sub-intervals (or s -intervals) and 
label these intervals from 2 to n as shown in 
Figure 5. Any sub -interval s contains ratios 
between n/s and n/(s - l). The third theorem 
in the appendix states that this integer s is the 
fewest number of non-zero bars possible for 
histograms with ratios in the s -interval. For 
example, if r = 1 then s equals n since "one" 
lies in the interval bounded by 1 and n/n - 1. 
Therefore, n bars must have non-zero values for 
a ratio of 1. Another example is that if n = 16 
and the ratio equals 5, then since 5 lies between 
16/4 and 16/3, s equals 4. Therefore this sixteen 
bar histogram must have at least four non-zero 
bars. There is no restriction on how many non- 
zero bars a histogram may have. 
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The smallest possible value for the largest bar is shown in Theorem 4 in the appendix, 
to be 


A 

s 




) 


( 5 ) 


When r = n we have s = 2, and the expression reduces to A. When r = l we have s = n and the 
expression reduces to A/n. Therefore, this expression agrees with the known histograms for the 
extreme values of r. 

If the largest bar of a histogram with ratio r is equal to the value given by Expression 5, 
that histogram in this special case is completely determined. From the proof of Theorem 4 we 
discover that it is a histogram with s bars, s - l of which are equal to the value of Expression 5. 
The remaining bar has the value 

A(, ./ <-»<- -O ') . ( 6 


Figure 6 shows several five bar histograms. Each histogram has the smallest largest bar possible 
for the indicated r. Note that the largest bars decrease as r decreases and that the smaller bar 
increases as r decreases until it equals the larger bars. Then as r continues to decrease a new 


bar increases from zero until it equals the 
larger bars again. This process continues until 
we have n equal bars. 

Figure 7 illustrates for sixteen bar histo- 
grams how the bounds on the largest histogram 
bar vary with the ratio. All histograms must 
lie within the cross-hatched region between the 
two bounds. 



Figure 6— Histograms (n = s) with the smallest peak 
height for indicated ratios. 



RATIO, r 

Figure 7— Bounds for the largest histogram bar (n = 16). 
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Figure 8 is a plot for sixteen-bar histo- 
grams of the largest possible plus-or-minus per- 
cent error to which the largest histogram bar is 
known as a function of the ratio. Note that for 
the sharp peaked curves the percentage error is 
quite small. The maximum percentage error of 
±43 percent occurs for a histogram with a ratio 
of slightly less than two. This corresponds to a 
flat curve where the peak height is of lesser 
significance than for the very peaked curves of 
the higher ratios. 

Amplitude Distribution 

Another quantity which can be deduced from a knowledge of the area and the sum of squares of 
a histogram is amplitude distribution. In fact, Theorems 5 and 6 give bounds for each bar of a 
histogram that are similar to those for the largest one. Let c x refer to the largest bar of a histo- 
gram, c 2 to the second largest bar, etc., so that c n refers to the n th largest bar of the histogram. 

In this notation for the case of a histogram with n bars c n will be the smallest bar of the histogram. 

Theorem 5 in the appendix gives the largest possible value for the p th largest bar of a histo- 
gram with area, A, and ratio, r. This value must be computed in one of two ways depending on the 
value of the ratio. If the ratio is between n/p and n, the largest possible value for the p th largest 
bar is given by 



Figure 8— Percentage error in the largest histogram 


ft) 

If the ratio is between 1 and n/p, the largest possible value for the p th largest bar is given by 

A (.,/ (-»<-) ) . (8) 

If p = l, the entire range of ratios is covered by Expression 8 which reduces to Expression 3 
when 1 is substituted for p. Thus it follows that Theorem 2 is a corollary to Theorem 5. 

When the ratio is equal to n/p (except for p = 1) both Expressions 7 and 8 give identical results, 
namely, that the largest possible value for the p th largest histogram bar is exactly A/p. This value 
is the largest possible value that the p th largest bar of a histogram can have for any ratio. It cor- 
responds to a histogram with exactly p, equal, non-zero bars. 

Theorem 6 in the appendix gives the smallest possible value for the p th largest histogram 
bar of a histogram with area, A, and ratio, r, for these cases not covered by previous theorems. 
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(Remember that Theorem 4 gives the smallest possible value for the largest bar, as in Expression 
5, and that when the ratio is between n/p - l and n, the smallest possible value for the p th bar is 
zero from Theorem 3.) According to Theorem 6 if the ratio is between 1 and n/(p - 1), the smallest 
possible value for the p th largest histogram bar (p - 2) is 

H'7 (p -^P) ■ <»> 

When the ratio is 1, Expression 9 reduces to A/n. This corresponds to the known extreme 
histogram for this ratio, namely, the one with n equal bars. 

When the ratio is n/(p- l) Expression 9 reduces to zero. This is in agreement with the result of 
Theorem 3. 

Figure 9 illustrates how the bounds on some of the histogram bars vary with ratio for the case 
where n = 16. Note that bounds on the largest bar were already illustrated in Figure 7. For bars 





Figure 9 — Bounds for (a) the second largest histogram bar, C 2 , (n = 16), (b) the third largest 
histogram bar, C 3 , (n = 16), and (c) the p th largest histogram bar C p , (n = 16). 
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other than the largest, the least upper bound for the p th largest bar increases from A/n when r = 1, 
to A'p when r = n/p, and then decreases to zero when r = n, and the greatest lower bound decreases 
from A/n when r = 1 to zero when r = n/( P - l). The two extreme cases r = 1 and r = n are the only 
completely determined ones. 

Figure 10 illustrates the consequences of these results for the case where n = 16. For ratios 
with integer values, the bounds for each histogram bar are drawn with the largest bar on the left 
and the smallest bar on the right. In this form one can see at a glance how the shape of the result- 
ing histograms varies as the ratio changes. Note that histograms with larger ratios are much 
steeper and narrower than those with lower ratios. 


FURTHER HARDWARE CONSIDERATIONS 

In the preceding section the information content of the area and sum of squares of a histogram 
was discussed. The discussion was based on these quantities* being known exactly. In practice, 
however, as in the IMP plasma experiment, these quantities will not be known exactly, but for a 
given set of bits these quantities will be known to lie between certain known bounds. For example, 
suppose the area, A, is known to lie between A max and A mirl and the sum of squares, 2, is known to 
lie between 2 max and 2 min . Then the ratio, r, must satisfy the following bounds: 




mAn < r < 


"Z 


( 10 ) 


Similarly, bounds for any bar of the histogram could be found by choosing the largest upper bound 
and the smallest lower bound for the bar from among those for all ratios in the range of Expression 
10 . 


Using the flight hardware for the IMP plasma experiment designed for a maximum number of 
counts in any histogram bar of 2 17 and a maximum total area over 16 bars of 2 19 counts, the ratio, 
r , is always determined to better than ±40.0 percent of its range, but on the average it is determined 
to about ±12.0 percent. 

Figure 11 shows a segment of the output of a computer program which relates the output of 
the IMP flight hardware to the input data which produced the specified output. For example, if the 
log counter output is 189 (275 octal) the area of the input histogram lies between 29,696 counts and 
30,719 counts. If, furthermore, the squarer output is 12, the ratio of the input histogram must be 
between 13.64 and 15.85 and the largest bar of this input histogram is between 27,312 counts and 
30,571 counts. For each of these quantities the harmonic mean (H.M.) of the range and the maximum 
± percentage error (P.E.) are also listed. 
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ABITS 

AREA.. 

MIN. 

MAX. 

H.M. 



189 (275) 


2 9696.00 30719. CO 


301 98. 84 


1.00 1.24 


3.40 

4.89 

4.53 

6.11 

5.67 

7.32 

6.81 

8.54 


1856.00 


5527.94 


2778.97 


37.26 

2099.35 

10883.28 

3519.75 

23.81 

4216.92 

14071.71 

6489.20 


6675.80 

8991.18 


11664.00 

13582.24 


16582.46 

18722.12 


20618.53 

22339.56 


190 (276) 


30720.00 31743.00 


31223.1 2 


9519.30 
12148.24 


14899.36 

16893.43 


L0.22 

14799.78 

23926.42 

18287.66 < 

9.42 

20319.01 

25406.30 

22579.65 ] 


L9 

11.12 

8.79 

22676.71 

26798.30 

24565.83 

8« 

tl 

12.30 

8.27 

24473.25 

28116.44 

26168.66 

6. 



MIN. MAX. H.M. 




'2 

4188.21 

13963.10 

6643.65 ! 

H 

6125.34 

16498.58 

8933.86 ' 


14.67 

8780.43 

18652.75 

11940.22 

12.60 

10219.43 

20558.76 

13652.44 

11.17 

13444.01 

22286.61 

16771.13 

10.12 

14794.25 

23878.52 

18269.43 


19233.80 

22174.21 


241B3.83 

25814.02 


25362.29 

26757.35 


28077.93 

29334.83 


rrtl 


218 
24251.14 


25985. 
27462.03 


13.74 

9.37. 



Figure 11— Segment of the output of a computer program relating output of IMP hardware to the input data which produced it. 













SUMMARY AND CONCLUSIONS 


This paper has shown how transmission of the area of a histogram and certain bits of the sum 
of squares enables one to recover much of the original histogram. It has shown how knowledge of 
the area of the histogram implies a restriction on the sum of squares of the histogram. This fact 
is the backbone of the entire process for it enables the sum of squares to be transmitted with a 
smaller number of bits if the area is also transmitted than if it is not. The paper has also shown 
how knowledge of both the area and sum of squares of a histogram implies restrictions on each of 
the bars of the histogram. 

As spacecraft travel farther away from the earth, the need for on-board processing of data 
will increase. The IMP plasma experiment is an example of a situation where onboard processing 
is necessary in order to be able to accomplish desired goals. The computation discussed in this 
paper has been able to increase the effective amount of information that can be transmitted to Earth 
from this experiment by more than an order of magnitude. 


(Manuscript received October 7, 1965) 
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Appendix A 


Mathematical Development 


Theorem 1 

Let n be a positive integer and let c . (i=i, 2, . . . , n) be non-negative real numbers. Then 



Proof: One form of the well known Schwartz inequality is: 



for any arbitrary real numbers a x , a 2 , . . . , a n , b 19 b 2 , . . . , b n . Choosing aj =a 2 = . . . = a n =land 
b. = c. (i=i, 2, leads to the following result: 



from which the left hand inequality of Equation A1 follows by division by n. Next the following 
equality shall be established for n = 2 by using incomplete induction. 


n i“l 


H c 0 = H c > 2 + 2 H H c * c i 

i=l 1 i=l i “2 j=l 


(A2) 


If n = 2 Equation A2 becomes: (cj + c 2 ) 2 = c t 2 + 2c x c 2 + c| which is clearly true. Assume 

Equation A2 holds for n. Then 




+ c „ 
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Apply the induction hypothesis to the first term of the last expression: 


f "I* \ 2 n n i-1 / n \ 

£ Ci ) = £ °? +2 £ £ c i c j + 2 c n«(£ c iV c „ 2 « 

k 1=1 / i = l i“2 j=l \ i = l / 


whence 


£'■)=£ c ‘’* 2 E E c ‘ c r 

i=l / i-1 i “2 j=l 


Thus, if Equation A2 is true for n it is also true for n + 1 , since it is true for 2, it must be true for 
n = 2. Now by hypothesis c. = 0, therefore c. = 0 and 


2 



c. c - 0 , 
1 j " 


Combining this last inequality with Equation A2 yields the right hand inequality of Equation Al. 


Theorem 2 


Let 



then c, ^ [l + /(n -1) (r -1)]. 


Proof: The second hypothesis may be rewritten 


n 




* 
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From Theorem 1 it follows that 


nTT £ - ‘ L 

\ i=2 / i -2 


Combining the last two relations and transposing to one side gives 


(n - r) c 2 - 2r Cl ( c\ + - r) c i j = 0 


Since 




i=l i =2 


and the last expression becomes 

(n - r) cl - 2 rCj (A - c ^ ^ n ^ (A - Cj) 2 | 0 * 

j£c?-J!Lc 1 A + (^--r) A 2 SO . 
n - 1 1 n - 1 1 \ n - 1 / 

n 2 c 2 - 2nCj A + [n - r (n - 1)] A 2 ^ 0 - 
n 2 c 2 - 2nc 1 A + A 2 ^ (n - 1) (r - 1) A 2 , 

(nCj - A) 2 = ( n -D(r - 1) A 2 , 


and 


nCj ~A i /(n-l)(r-l) A » 


-! SA [l +/(n - 1) (r - 1)] • 


Theorem 3 

Let s denote the number of non-zero bars in a histogram. Then for this histogram the ratio 


n A 


II ^ 
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In particular it follows that 



r 


Proof: Assume without loss of generality that c s+1 - c s+2 - ... - c n - 0 and letc^ - for 
i = 2, 3, . . s. Then 


n [c 2 + (b 2 c x ) 2 + (a 3 c x ) 2 + • • • + (a^) 2 ] 
[c r + (a^) + (a 3 c x ) + . • • + ( a s c i > )^ 2 


n (1 + a 2 + a 2 + • • • + a 2 ) 

(1 + a 2 + a 3 + . . . + a s ) 2 

It is asserted that this expression for r assumes a minimum when a 2 = a 3 = . . . = a s = 1. To shov 
this let a. = 1 + e. (i = 2, 3, . . . , s) . Then 


n[l+ (1 + e 2 ) 2 + (1 + e 3 ) 2 + . . . + (1 + e g ) 2 ] 
[1 + (1 + e 2 ) + (1 + e 3 ) + . . . + (1 + e s )] 2 


n ( s+2 12 e * + h e *) 

\ i - 2 i=2 / 
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III III 1 1 1 i 


mi i i r i i in ii i 


I 


But from Schwartz’ inequality (which doesn’t depend on the e. 's being positive) it follows that 


L 

l =9 


^ (s - 1) 


S 



i = 2 


and making the right hand side still larger, 


and 



s 



i = 2 


whence 


s 






^0 . 


Thus since r takes its minimum value when this last expression is zero it follows that: 


r 



s 


Lemma: Let u and v be the values of the non-zero bars of an n-bar histogram with i bars equal 
to u and j bars equal to v, where i + j * n, the area of the histogram: iu + j v = A, and the s um 

of squares of the histogram: i u 2 + j v 2 = rA 2 /n. Then 


u= TT 7 (* + /l R ) ’ (A3) 

and 

v = rn(‘-vf R )' (A4) 


17 



where 


r 2 = C 1 h - j) r _ 1 
n 


Furthermore u l v whenever r ^ n/(i + j) and v * 0 whenever n/i = r . 

Proof: Let us verify the four assertions about u and v. First let us consider the difference u-v : 


A 

U — V “ 7 

i + J 


A 

i + j 


Vr *) 


(fl-fi) 


R l 0. 


This result will be true whenever R is a real number which is the case whenever R 2 - 0 or equiva- 
lently whenever r ^ n/( i + j ) . 

Next let us consider v = 0. This will be true whenever l - /ITT R - 0 or equivalently whenever 
1 = (i/j) R 2 . Substituting for R 2 we have l = (i/j) [( i + j)r/n - l]. This result will be true when- 
ever j/i + 1 = (i + j )r/n or equivalently n/i = r. 

Next let us consider the sum 


iA 

1U + J V - - 7 

J- + J 




A 

i + j 


(i + /fj R + j - /TJ R) = A • 


Finally let us consider the sum of squares: 


•2 *2 iA 2 (y 0 /Tn j R 2 \ jA 2 

i + j v* = 1+2 J-4-R + — ; — + — 

(i + j) 2 \ T 1 1 / (i + j) 2 



+ 


j 


) 


-- A -— (i + j+ (j+i)R 2 ) = -r— — r fl + <L±iil-l) 

(i + j ) 2 1+J ' n ' 



Corollary: Let € = u/v , where u: and v are as defined in the lemma. Then 1 - <f - 0 if n/( i + j ) 
= r = n/i, £ - 1 only for r = n/(i + j) and f = 0 only for r = n/i. 
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Proof: If n /(i + j) - r = n/i then u - v - o. Dividing by u gives 1 = = 0 . From Equation A5 

and the definition of r we note that u = v only if r = n/(i + j ) whence £ = 1 only if r = n/(i + j ). 
Also we note that by replacing the symbol - by equality in the proof that v = o when n/i = r we 
obtain v = 0 when n/i = r . 


Theorem 4 

Let the real numbers c l9 c 2 , . . . , c n represent a histogram with area A and sum of squares 
rA 2 /n . Order these numbers so that Cj - c 2 - . . . - c n - 0. Then if n/s - r = n/(s - l) where s is 
an integer between 2 and n, a histogram that has the smallest possible value for c x is of the form 
of the lemma with i = s - l and j = 1 . The value of c x under these conditions is 


.A 

s 



rs - n 
(s-l) 


)• 


Proof: Pick an integer, s, between 2 and n. Then let i = s - 1, j = 1 in Equations A3 and A4 of 
the lemma to obtain the histogram: 


qi = q 2 = 


= q a 


.4 




> 


(A5) 


= 


= o 


< < 

which must satisfy the hypothesis of the theorem whenever n/s - r - n/(s '1). To prove that this 
histogram assumes the greatest lower bound for the largest histogram bar let the numbers c . 

(i = 1, 2, . . . , n) represent any other histogram which satisfies the hypothesis of the theorem. 

We wish to show that c x = q 1# 

Let the c/s be related to the q/s so that 

c i = Qi + q x e. (i = 1, 2, • • • , n) » ( A6 ) 

where the e/s may be either positive, negative, or zero. Then we have 


n 



L 


(li + qi e i) = 2Z qi+q » 

i*l 


L 


e i = A + q, 


L 

i*l 


e. 
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Thus we have 


n 



i=l 


= 0 • 


(A7) 


In this and the succeeding theorems we shall consider a function, f , of n variables defined as 


f “ t( € ! • e 2 






Substituting for c x from Equation A6 


'■ L 




(A8) 


Letting <f = q s /qj (which is consistent with the definition of in the corollary to the lemma) and 
substituting in Equation A8 we have 


s~l 

L 


n 

[(1 + e .) 2 - 1 ] + [(f + £ s ) 2 - f 2 ] + 

i*s + l 


s-I 

■ L 


n 

(1 + 2e i + e. 2 - 1) + (i 2 + 2^e s + e 2 -£ 2 ) + ef 

i = s+l 


= 2 


£ t 

i«l i“l 
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Since 


6 . 


e. , 



i*l 


Now assume e x 


whence 


= 0 we have e s 


■ * 
£ 


n 



1 


i» 3 +l 


f = 2 



i*l 


€. 


s -1 


E 

i = . + l 


s-1 



+ e 


s -1 n 

Z>-£ 

i*l i*s+l 


n 




e ij (2-2^.€ s)+ £ 6 2 


+ 


Z] ^ ( - 2 ^- e *> + Z ] e < 2 

i a s+l J i*s + l 


S-l 

= 2Z e. [€. -€ s +2(1 - ft] 

i=l 


n 

+ ZZ ** [£i "*• ' 2f] ' 

i-s + 1 


(A9) 


0. By the ordering hypothesis on the cVs we have, upon dividing by q t : 

(1 + £,) I (1 + e 2 ) I • • • £ (1 + e,.!) ^ Of + « s ) = *» + ! = ' * ’ = e „ = 0 ’ 
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so that for i = 1, 2 s - l we have e t < 0 since ej < 0. Also for i - 1, 2, . . . , s - l we 

> > 

have l+e. =f+e s . Whence e. - e s + l - £ = 0. From the corollary to the lemma we have 
1 - f - 0 so that by adding inequalities we have [e. - e s + 2(1 - <£)] =0 for i=l, 2, 

Thus it follows that the first s - l terms in Equation A9 are = 0 . 

It also follows from the ordering that for i = s + 1, s + 2, , , . , n we have e. = o. Also for 
i = s + 1, s + 2, . . . , n we have + e s - e i . From the corollary we have £ = 0 so that 
[e. - e s - 2<f] - 0. Combining these results it follows that f = 0. 

Also from the corollary we have 1 - f < 0 if r> n/s from which it follows that f < 0 since we 
already know that f - 0 and the first s - 1 terms of Equation A9 are all < 0 . 

If r = n/s ? the corollary states that = 1. In this case the ordering hypothesis gives us 
1 + e s = e. for i = s + 1, s + 2, ,,.,nsO that e. - e g - 1 = 0 whence e i " e s “ 2 < 0 . Now 
if none of the first s - 1 terms of Equation A9 is < 0 we can conclude that € i ( € i “ e s ) ~ 0 for 
i = l, 2, . . . , s - 1 and therefore that e 1 = e 2 = ...= e s . Since 



= 0 


at least one of e s + 1 , e s+2 , . . . , e n must be > 0 and the corresponding term of Equation A9 will be 
< 0. Hence in all cases f < 0. But this implies that 




which contradicts the assumption that the c 1 , s satisfy the hypothesis of the theorem. Thus the 
assumption that e 1 < 0 must be false, that is ^ = 0 so that 


c 


1 




rs - n 
n(s - 1) 


)' 


Theorem 5 

Let the real numbers c., c 2 , . . . , c n represent a histogram with area a and sum of squares 

> > > > 

rA 2 /n. Order these numbers so that c t - c 2 - . . . = c n = 0. Then a histogram that has the largest 
possible value for c p (p between 2 and n) is of the form of the lemma with i = 1 and j = p - 1 if 
n/p < r ^ n for any p T s and with i = p and j = n- pifl = r= n/p . The corresponding values of 
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c p under these conditions are 


A_ 

p 



pr - n 

n(p - 1) 


) 


if 



for p | 2 


and 


ZS) lf 1SrS » for . llp 

Proof: Case I: Let n/p = r = n and let the i = l and j = p - l in Equations A3 and A4 of the 
lemma to obtain the histogram: 





which must satisfy the hypothesis of the theorem whenever n/p = r = n. Let the numbers 
c £ (i = i, 2 ,. . . , n represent any other histogram which satisfies the hypothesis of the theorem. 
We wish to show that c p = q p . If we define the numbers e. (i = i, 2, . . . , n) as in Equation A6 
using the q* T s as defined in Equation A10 we conclude that Equation A7 holds by the same argument 
as before. We shall also define the function f as before (Equation A8) and noting that %/<\ x = £ we 
obtain 


f = [(1 + 6 1 ) 2 -1] + [<£+«!>* -£“]+ 2Z e * 


i-2 


i=p+l 


P n 

= £j(2 + e t ) + 22 e i ^ + € i ) + ** 


i*2 


i=p+l 
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II 1 1 1 III I I I III I 


I I III llll 


linn mm in ii 


n 

From 

i=l 



e. =0 we have 


n 



1 = 2 


Substituting this we have 


f = - 2 


n / n 

£*■♦ (L 

i=2 \ i=2 




p 

L 

i=2 


p 

L 


n 



i=p+l 


and 


-(£ 




i»2 i K p+l 


n / n 




i-2 \ j *2 


i“p+l V j»=2 




i=2 \ j=2 


i=p+l \ i=2 


e i Ce i (e. - 1) 

i“p+l 


i=2 


(£ -> [ £ (S' 


P n 

7~1 e i [e i - (i -^>] + e i ( e i 

i c 2 i*p+l 


i). 
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Since the Cj ’s must satisfy the ordering hypothesis of the theorem, we have upon dividing by q t 


(1 +«,)!« + e 2 U 


^ (£ + %) ^ 


ie l 0 . 


Thus = 0 for i = p + 1, p + 2, . . . , n. Furthermore e 2 = e 3 = . . . = e p . Assume e p > o. 

Then e. > o for i = 2, 3, . . . , p. Thus in Equation All the left factor in each term is at least 
non-negative. Also 


l - £ l 


n 



j = 2 


whence 



e. 

j 


-(1 



< 0 . 


From the corollary to the lemma it follows that f = 0 so that 



also. 


Similarly [e. - (1 - <?)] < 0 for i = 2, . . . ,p and (e. - i) < o for i = P + i, . . . , n. From 
these results it follows that f < 0, since the right hand factor in each term of Equation All is 
negative and the left hand factor in the first term of Equation All is positive. 

But then, 



i=l i=l 


which is a contradiction. Thus e p = 0 so that c p = q p . 
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Case II: Let l = r = n/p and let i = p and j = n - p in Equations A3 and A4. We then obtain the 
histogram: 


Ql = q 2 = . . . = Qp 


4 (> * ) . 


% + 1 ” % + 2 " * 


-4 


(A12) 


which satisfies the hypothesis of the theorem whenever 1 = r = n/p . As in Case I we let the num- 
bers c. (i = 1, 2, . . . , n) represent any other histogram which satisifes the hypothesis of the 
theorem, and define quantities e. (i = l, 3, . . . , n) as in Equation A6. We again wish to show 
c p = qp and so proceed as before to show that Equation A7 holds and to define the function, f , 
using Equation A8. By substituting q n /q x - £ 


'■ E 


[(1 + e .) 2 - 1 ] + l (£ + e .) 2 -£ 2 ] 

i*p+l 



n 

(26. + £. 2 ) + 2^ + e 2 ) 

i*p + l 


■ E 

i«l 



** * 2f E 

i=p+l 


€. • 


If we now substitute 


it follows that 



-E 


e 2 +2(1-0 j] 

i*l 


€. . 


(A13) 
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Assume now that e p > o. Then from the ordering hypothesis it follows that for i = l, 2, . . , , p we 
have e. > o whence 


L 

4 = 1 


€. > 0 • 
l 


But from the corollary it follows that 1 - = 0 so that the second term of Equation A13 is = 0. 

Clearly the first term of Equation A13 is positive since each term in the summation is = 0 and the 
first p terms are actually > o. Thus we have f > 0. 

But this implies that 


L 

i=l 


n 



i*i 


rA 2 

n 


which is a contradiction. Therefore the assumption that e p > o must be false, that is e p = 0 which 
implies: 


c = q 

D *n 


Theorem 6 

Let the real numbers c l7 c 2 , . . . , c n represent a histogram with area A and sum of squares 
rA 2 /n . Order these numbers so that c 1 = c 2 = . . . = c n = 0. Then if l = r = n/(p - i)a histogram 

that has the smallest possible value for c p (p between 2 and n) is of the form of the lemma with 
i = P - 1 and j = n - p + i. The corresponding value for c p under these conditions is 


A. 

n 



(p - 1) (r - 1) \ 
n - p + 1 / 


Proof: Let 1 = r - n /(p - l) and let i - p - l and j - n - p + l in Equations A3 and A4. We have 
thereby obtained the histogram: 


4 (■ ♦/ 


( n - P + 1) (r - 1) 
P - 1 


)• 


c 

p 


P + 1 


c n =A fi./ cp-D (r- l), 

" n \ V n-p+i J 


> 


(A14) 
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which indeed satisfies the hypothesis of the theorem whenever 1 = r = n/(p - 1). If we note here 
that this histogram is the same as that of Case II of Theorem 5 with p - 1 substituted for p we may 
proceed similarly to obtain: 



i*i i*i 


e i + J2 e i 

i=p 


where here <f - Since 


L 

i *1 

we can substitute 



i=P i=l 


f = 



- 2 ( i - €) J2 £i • 

i=p 


(A15) 


The first term is clearly non-negative. Assume e p < 0. Then from the ordering hypothesis it 
follows that 0 > e p - e p+1 - whence 


L '■ <0 - 

i=p 

From the corollary, it follows that 1 - § - 0 so that the second term of Equation A15 is - 0 . The 
first term of Equation A15 is clearly > 0 since each term of the summation is at least = 0 and those 
terms corresponding to i = p, p + l, ...» and n are > o. 

Thus f > 0. But this implies that 



which is a contradiction. Thus the assumption that e p < 0 is false; that is, e p = 0 . This implies 
that c — q * 

P P 
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